Speech recognition overview

Speech recognition (SR) is the ability of the operating system to convert spoken words to written text. An internal driver, called an SR engine, recognizes words and converts them to text. The SR engine may be installed with the operating system or at a later time with other software. During the installation process, speech-enabled packages such as word processors and web browsers, may install their own engines or they can use existing ones. Additional engines are also available through third party manufacturers. These engines often use a certain jargon or vocabulary; for example, a vocabulary specializing in medical or legal terminology. They can also use different voices allowing for regional accents such as British English, or speak a different language altogether such as German, French or Russian.

You need a microphone or some other sound input device to receive the sound. In general, the microphone should be a high quality device with noise filters built in. The speech recognition rate is directly related to the quality of the input. The recognition rate will be significantly lower or perhaps even unacceptable with a poor microphone. The Microsoft Speech Recognition Training Wizard (Voice Training Wizard) guides you through the process and recommends the best position to place the microphone allowing you to test it for optimal results.

Once you have installed the system and it is working, it is important to train it for your environment and speaking style. On the Speech Recognition tab, click Train Profile and use the Voice Training Wizard to train the system to recognize background noises such as a fan, the hum of air conditioning, or other office sounds. It adapts to your speaking style including accents, pronunciations and even idiomatic phrases.

Speech Recognition Tips

Speech recognition is not designed for completely hands-free operation; you'll get the best results if you use a combination of your voice and the mouse or keyboard. Also a consistent quality of speech results in the best results. When speaking to others, we usually understand from the context and environment even when whispered, shouted, or talking quickly or slowly. However, speech recognition understands words best when spoken to in a more predictable manner.

Speak in a consistent, level tone. Speaking too loudly or too softly makes it difficult for the computer to recognize what you've said.
Use a consistent rate, without speeding up and slowing down.
Speak without pausing between words; a phrase is easier for the computer to interpret than just one word. For example, the computer has a hard time understanding phrases such as, "This (pause) is (pause) another (pause) example (pause) sentence."
Start by working in a quiet environment so that the computer hears you instead of the sounds around you, and use a good quality microphone. Keep the microphone in the same position; try not to move it around once it is adjusted.
Train your computer to recognize your voice by reading aloud the prepared training text in the Voice Training Wizard. Additional training increases speech recognition accuracy.
As you dictate, do not be concerned if you do not immediately see your words on the screen. Continue speaking and pause at the end of your thought. The computer will display the recognized text on the screen after it finishes processing your voice.
Pronounce words clearly, but do not separate each syllable in a word. For example, sounding out each syllable in "e-nun-ci-ate," will make it harder for the computer to recognize what you've said.